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FOREWORD 


The  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  Is  a 
multlaptltude  test  battery  used  for  selection  and  classification 
of  United  States  Military  personnel.  The  purpose  of  this  research 
was  to  examine  the  effects  of  massed  retesting  or  practice  on  the 
statistical  characteristics  of  ASVAB  subtest  and  composite 
scores.  Applicants  who  fall  to  qualify  because  of  low  ASVAB 
scores  may  be  permitted  to  retake  the  test  battery.  The  results 
of  this  research  showed  the  level  (means)  of  test  scores  to 
Increase  somewhat  over  sessions  but  other  characteristics  of  the 
battery  (variances,  reliabilities,  covariances)  remained  stable, 
after  correction  for  range  restriction.  That  Is,  Individuals  will 
probably  Improve  their  scores  with  retesting,  but  the  psycho¬ 
metric  properties  of  those  Improved  jjjcores^pne jiot  changed* 

EDGAR  M.  JOHNSON 
Technical  Director 


THE  EFFECTS  OF  PRACTICE  ON  THE 

ARMED  SERVICES  VOCATIONAL  APTITUDE  BATTERY 

EXECUTIVE  SUMMARY _ 


Requirement: 

To  study  the  stability  of  the  statistics  of  the  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB)  over  multiple 
administrations. 


Procedure: 

Five  alternate  forms  of  the  ASVAB  were  administered  to 
fifty-seven  men  and  women  of  military  service  age.  The  objective 
was  to  determine  to  what  extent  means  and  cross-session 
correlations  are  stable  over  several  administrations.  Ten 
Individual  subtests  and  combinations  of  certain  of  these 
subtests  were  examined  for  stability. 


Findings: 

The  means  for  this  sample  were  below  the  national  average, 
and  scores  were  less  dispersed.  Means  Increased  over  sessions  .5 
standard  deviation  or  more  on  half  the  subtests  and, 
consequently,  on  most  of  the  composite  scores.  Correlations  for 
the  subtests  and  the  composites  were  largely  stable  over  sessions 
and  were  slightly  higher  later  In  practice.  Reliabilities  were 
comparable  to  reference  populations  when  adjusted  for  the  range 
restriction  of  the  present  sample.  The  implications  of  practice 
effects  for  paper  and  pencil,  as  well  as  automated,  selection 
tests  are  discussed. 


Utilization  of  Findings: 

These  analyses  provide  evidence  for  the  differential 
stability  of  composites  formed  from  the  ASVAB.  The  trend  toward 
Increasing  means  with  extended  practice  should  be  replicated  In  a 
larger,  more  representative  sample.  If  cross-validated,  such  a 
replication  will  recommend  the  requirement  for  accurate  record 
keeping  of  prior  ASVAB  testing  of  applicants  for  military 
service. 
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THE  EFFECTS  OF  PRACTICE  ON  THE 

ARMED  SERVICES  VOCATIONAL  APTITUDE  BATTERY 


INTRODUCTION 


Several  years  ago,  Jones  (1969)  proposed  a  two  process  theory 
to  describe  individual  differences  in  the  acquisition  of  skills. 
No  inference  was  made  at  that  time  concerning  the  potential 
relevance  of  that  theory  to  changes  In  tests  of  ability.  The 
theory  posited  an  acquiiition  phase, in  which  persons  improved  at 
different  rates,  and  a  terminal  phase,  in  which  persons  reach  or 
appro  .imate  their  Individual  Timits.  The  theory  specified  that 
different  persons  could  be  expected  to  begin  at  different  points 
initially,  and  arrive  at  their  different  terminal  levels  via 
different  pathways.  The  theory  further  Implied  that,  after  the 
terminal  process  is  reached,  persons  will  cease  to  change 
positions  relative  to  each  other,  despite  additional  practice.  In 
other  words,  several  Individuals  may  approach  a  task  with 
differing  experience  levels  and  capacities,  both  of  which 
influence  their  initial  scores.  As  practice  continues,  previous 
experience  will  begin  to  contribute  proportionately  less  to  a 
person's  score,  and  individual  differences  In  learning  would 
increasingly  Influence  his/her  test  score.  As  the  amount  of 
experimental  time  increases  proportional  to  previous  practice, 
and  as  learning  progresses,  differences  between  subjects  will 
become  more  attributable  to  actual  differences  in  underlying 
capacity  or  ability,  until  finally,  the  amount  of  ability  Is 
largely  what  governs  performance  scores. 

Thus,  an  inter-session  correlation  matrix  would  present  a 
distinctively  different  appearance  If  performance  early  versus 
late  in  practice  were  examined.  Early  in  practice,  one  would 
observe  the  superdiagonal  form  (Jones,  1969),  In  which 
correlations  between  adjacent  trials  would  be  higher  than 
comparisons  which  are  more  remote.  If  the  theory  holds,  the 
cross-session  correlation  coefficients  would  eventually  become 
constant  and  symmetrical.  When  this  occurs,  no  systematic 
differences  would  be  present  In  the  matrix  as  a  function  of 
temporal  separation.  If  the  terminal  process  Is  not  reached,  then 
the  matrix  will  continue  to  show  superdlagonal  form  (Jones, 
1969),  and  the  task  Is  considered  not  to  have  stabilized. 

Recently,  a  program  was  begun  to  standardize  a  performance 
test  battery  applying  these  principles  of  differential  stability 
(Kennedy  &  Bittner,  1977).  In  order  to  study  the  effects  to 
humans  of  adverse  environments,  It  would  be  desirable  that  the 
test  battery  assess  complex  mental  abilities  which  could  be 
related  to  elements  of  military  Jobs.  A  natural  consequence  of 
research  In  this  area  of  environmental  stress  Is  that,  generally, 
each  subject  serves  as  his  own  control  over  many  sessions.  In 
other  words,  repeated  measures  analysis  of  variance  Is  required-- 
a  differential  approach.  Moreover,  within  the  context  of  this 
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theory,  performance  on  all  tasks  In  the  battery  would  need  to 
be  at  terminal  levels  before  an  experimental  treatment  was 
Introduced.  Many  batteries  have  purported  to  measure  primary 
mental  abilities,  and  several  have  been  factor  analyzed.  However, 
none  of  these  had  been  examined  In  terms  of  stability  of  subtests 
over  sessions;  and,  generally,  the  factor  analyses  which  were 
performed  were  conducted  on,  at  most,  two  replications,  a 
questionable  approach  If  rate  changes  occur  due  to  practice  { cf . , 
Alvares  &  Hulln,  1972) . 

Findings  from  over  sixty  tests  (Kennedy,  Carter  &  Bittner, 
1980),  which  were  administered  In  a  fifteen-day  repeated  measures 
paradigm,  support  the  rate-terminal  theory  of  skill  acquisition. 
Additionally,  these  findings  permit  the  theory  to  be  generalized 
to  Include  other  behavioral  tests.  Specifically,  the  data 
indicate  that  people  do  exhibit  differential  rate  processes  over 
practice,  when  faculties  are  measured  by  tests  of  short-term 
memory,  grammatical  reasoning,  learning  ability,  and  several 
other  cognitive  tests  (see  Kennedy  &  Harbeson,  1981,  for  a 
review) . 

Researchers  have  studied  practice  effects  on  Intelligence  and 
ability  tests,  and  It  has  been  known  since  at  least  1920  that 
test  scores  increase  (Dunlap  ft  Snyder,  1920;  Gundlach,  1926; 
Thorndike,  1922).  Additionally,  reviews  of  performance  changes 
on  individually  administered  intelligence  tests  (Thompson,  1975 ) 
and  scholastic  aptitude  (Nader,  1980),  when  administered  over 
repeated  testings,  have  suggested  that  performance  on  these  tasks 
also  may  be  less  stable  than  previously  considered. 

In  recent  years,  there  has  been  an  increased  Interest  in 
practice  and  coaching  effects  (Anastasl,  1981;  Catron  1  Thompson, 
1979;  Messick  &  Jungblut,  1981;  Whimbey,  Carmichael,  Jones, 
Hunter  &  Vincent,  1980;  Wing,  1980).  However,  few  investigations 
have  been  conducted  which  involve  more  than  two  or  three 
replications.  What  evidence  there  is  suggests  that  repeated 
testing  may  produce  appreciable  effects  on  mean  test  scores. 
Mackaman,  Bittner,  Harbeson,  Kennedy  and  Stone  (1982)  found  that 
Inter-session  correlations  on  the  Wonderlic  were  stable  over  18 
replications,  but  the  scores  increased,  on  the  average,  21 
percentile  points.  This  suggests  that  exposure  history  is  an 
Important  variable,  with  regards  to  the  testing  and  subsequent 
assignment  of  personnel. 

The  Armed  Services  Vocational  Aptitude  Battery  (ASVA8) 
possesses  many  of  the  same  type  of  test  items  as  the  Wonderlic 
(Kass,  Mitchell,  Grafton  ft  Wing,  1982).  In  addition,  other 
tests,  similar  to  the  subtests  found  In  ASVAS,  have  not  always 
differentially  stabilized  after  many  trials  (cf.,  Kennedy  et  al., 
1931),  and  rarely  have  tests  exhibited  mean  or  differential 
stability  from  the  first  session.  The  Importance  of  this  lack  of 
stabilization  should  not  be  overlooked.  Various  combinations  of 
ASVAB  subtests  are  used  for  counseling  (Fischl,  Ross  ft  McBride, 
1979)  and  for  assignment  to  service  schools  (Sims  ft  Hiatt,  1981; 


Swanson,  1979).  In  a  review  of  95  different  Navy  enlisted 
ratings.  Carter  and  Biersner  (1982)  showed  how  abilities  from 
ASVAB  and  other  aptitude  test  batteries  would  map  onto  disparate 
Navy  jobs.  If  a  test  were  unstable,  then  predictions  made  on  the 
basis  of  scores  from  it  would  be  less  accurate.  Thus  the  value 
of  prediction  would  be  lessened. 

Various  subgroups  of  the  population  with  whom  the  ASVAB  is 
used  may  vary  with  respect  to  amount  of  experience  in  taking 
standardized  tests.  It  might  be  expected  that  individuals  with 
less  sophistication  In  test~tak1ng  skills  would  take  longer  to 
produce  a  stable  pattern  of  scores.  Moreover,  the  initial  test 
scores  of  these  individuals  would  be  less  effective  In  predicting 
later  performance.  Additionally,  racial  differences  in  repeated 
measures  of  test  performance  were  reported  by  Dyer  (1970).  He 
found  that  in  uncoached  practice  sessions,  black  college  students 
showed  a  statistically  significant  increase  over  white  students 
in  three  administrations  of  alternate  forms  of  a  standardized 
test  of  reasoning  ability.  An  investigation  of  repeated 
administrations  of  the  ASVAB,  therefore,  should  include 
examination  of  performance  which  may  be  unique  to  particular 
groups  of  individuals  with  whom  the  test  may  be  used. 

It  was  the  purpose  of  this  investigation  to  determine  whether 
practice  modified  performance  on  alternate  forms  of  the  ASVAB, 
Practice  effects  would  be  observed  as  changes  in  means,  variances 
and  cross-session  correlations.  Stability  of  ASVAB  would  be 
determined  according  to  the  extent  to  which  the  test  met 
standards  developed  in  repeated  measures  experimentation  and 
Included  group  and  differential  criteria.  It  was  hypothesized 
that  improvement  would  continue  over  sessions,  and  that  some 
tests  would  be  differentially  unstable. 


METHOD 


Subjects 

The  subjects  were  57  men  and  women  enrolled  as  trainees  In 
the  Job  Corps  Center,  Shreveport,  LA.  Thirty-four  subjects  were 
male  (29  Black  and  5  White),  and  23  were  female  (19  Black  and  4 
White).  Effort  was  made  to  assure  maximum  response  by  Center 
trainees.  It  was  explained  that  subjects  would  be  required  to 
take  the  ASVAB  on  five  consecutive  mornings  and  that  the  results 
would  be  used  for  research  purposes.  Additionally,  trainees  were 
told  that  their  scores  from  the  first  day  of  testing  could  be 
used  for  determining  their  eligibility  for  enlistment  In  the 
armed  services.  If  they  so  desired.  It  was  emphasized  that 
participation  In  this  project  would  not  obligate  subjects  to 
consideration  for  military  service.  Trainees  were  also  told  that 
they  would  be  paid  for  their  participation  contingent  upon 
completion  of  all  five  days  of  testing.  The  first  60  volunteers 
were  selected.  On  the  second  day  of  testing,  two  subjects  dropped 
out,  and  a  third  quit  on  the  fourth  day.  All  three  left  due  to 
unforeseen  work,  school  or  family  circumstances. 


Apparatus  and  Procedure 


Five  forms  of  the  ASVAB  were  administered  from  8:00  AM  to 
12:00  noon  in  a  group  setting  for  five  consecutive  days.  On  each 
day  of  testing  all  subjects  took  the  same  form  of  the  ASVAB.  The 
order  of  administration  was:  Form  8b,  9a,  9b,  10a,  10b.  These 
five  forms  are  considered  of  equal  difficulty  (Ree,  Mullins, 
Mathews  t  Massey,  1982).  Forms  of  the  ASVAB  having  the  same 
number  also  had  identical  items  comprising  the  subtests  of: 

General  Science  (GS)  Mathematics  Knowledge  (MK) 

Coding  Speed  (CS)  Mechanical  Comprehension  (MC) 

Auto/Shop  Information  (AS)  Electronics  Information  (El) 

Different  across  forms  were: 


Paragraph  Comprehension  (PC)  Numerical  Operations  (NO) 
Arithmetic  Reasoning  (AR)  Word  Knowledge  (WK) 

For  additional  information  the  reader  1$  referred  to  the 
reference  works  of  Ree  et  al.  (1982)  and  Kass  et  al.  (1982). 
Administration  followed  standard  procedures  and  was  conducted  by 
members  of  the  Shreveport  Military  Enlistment  Processing  Station 
(HEPS).  Neither  coaching  nor  feedback  was  given  to  subjects 
during  the  days  of  testing. 
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Scoring 


Subjects'  responses  were  made  on  answer  sheets  which  were 
scored  by  computer  at  the  MEPS  on  the  afternoon  of  each  day  of 
the  project.  ASVAB  subtest  results  were  reported  in  raw  score 
form.  These  different  subtests  were  combined  to  form  composite 
scores  for  AFQT  and  for  ten  aptitude  areas.  (See  Table  1.) 
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ASVAB  Subtests 
Means 

Significant  linear  trend*  indicating  an  improvement  with 
practice  in  the  absence  of  feedback,  occurred  with  four  test 
sections:  Coding  Speed,  Numerical  Operations,  Mathematics 
Knowledge  and  Mechanical  Comprehension.  The  means  and  associated 
p-values  for  linear  and  quadratic  relationships  are  presented  in 
Table  2.  The  most  dramatic  Increases  were  for  Coding  Speed  and 
Numerical  Operations,  where  the  average  fifth  test  performance 
exceeded  the  average  of  the  first  test  performance  by  48.3%  and 
27.0%,  respectively.  No  test  showed  a  significant  drop  with 
practice.  However,  both  Word  Knowledge  and  Paragraph  Comprehen¬ 
sion  showed  significant  quadratic  (U-shaped)  changes  over 
sessions,  which  suggests  possible  motivational  deficits  on  the 
intermediate  Days  2,  3  and  4.  The  significant  quadratic  component 
for  Coding  Speed  was  apparently  due  to  the  rapid  increase  in  mean 
score  from  Day  1  to  Day  2,  followed  by  a  slower  increase 
thereafter. 

The  mean  scores  ;<n  i,he  first  administration  are  slightly  more 
than  one  standard  deviation  below  those  reported  by  others  (Kass 
et  al.,  1982;  Ree  et  al.,  1982).  However,  of  those  tests  which 
later  showed  improvement  (viz.,  CS,  NO,  MK,  MC),  the  arithmetic 
mean  scores  are  slightly  less  than  a  standard  deviation  lower  in 
subsequent  sessions  than  found  in  these  other  experiments.  The 
standard  deviations  were  constant  over  sessions  and  about  75%  the 
size  of  the  larger  samples  (Kass  et  al.,  1982;  Ree  et  al.,  1982). 

Correlations 

The  intercorrelations  across  five  repeated  administrations 
of  each  subtest  of  the  ASVAB  are  presented  in  Table  3.  The 
sample  size  obtained  (N  =  57)  was  too  small  to  permit  reliable 
Inferences  from  factor  analyses. 

For  five  of  the  tests  (General  Science,  Arithmetic  Reasoning, 
Word  Knowledge,  Numerical  Operations  and  Coding  Speed),  the 
highest  correlations  approximate  conventional  reliability 
estimates.  However,  for  the  remaining  five  tests  (Paragraph 
Comprehension,  Auto/Shop  Information,  Mathematics  Knowledge, 
Mechanical  Comprehension  and  Electronics  Information),  the 
"highest"  figures  are  lower  than  conventional  reliability 
estimates  ( cf . ,  Kennedy  et  al.,  1980).  The  latter  five  tests  are 
stable  in  the  sense  that  all  five  administrations  measure  the 
same  underlying  variation  (cf.,  Jones  &  Kennedy,  1983).  The  ASVAB 
composites,  as  would  be  expected,  have  much  higher  reliabilities 
and  intersession  correlations  (see  Table  4). 

^  One  or  more  of  these  subtests  are  Included  in  nine  of  the  ten 
composi tes--the  exception  being  ST. 
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The  correlations  improved  over  the  five  practice  sessions  for 
nine  out  of  ten  subtests,  the  exception  being  Electronics 
Information.  The  average  intersession  correlation  for  the  first 
three  days  (1,  2  and  3)  was  compared  to  the  average  of  the  last 
three  days  (i.e.,  3,  4  and  5).  It  is  recognized  that,  while  not 
an  independent  comparison,  it  is  instructive  to  compare  the 
means.  The  mean  improvement  in  reliability  correlation  was  small 
(viz.,  r=.61  versus  .68)  but  obvious,  and  in  some  cases 
non-trivial  (e.g.,  CO  r  =.72  vs  .84).  The  correlations,  corrected 
for  attenuation  due  to  range  restriction  following  the  equation 
in  Sims  and  Hiatt  (1981),  are  consistent  with  those  reported  in 
Friedman,  Streicher,  Wing  and  Grafton  (1982).  The  later  days' 
correlations  (days  4  and  5)  are  all  slightly  higher,  the  early 
days'  (1  and  2)  approximately  the  same  or  else  higher.  These 
values  appear  in  Appendix  A. 


Sex 


Sex  only  approached  significance  on  one  subtest,  Mechanical 
Comprehension,  F  (1,53)  =  3.25,  p  =  .0772;  the  mean  for  females 
was  7.71  and  the  mean  for  males  was  9.34. 


ASVAB  Composites 


Means 

Linear  and  quadratic  trends  are  reported  in  Table  4. 
Significant  trends  occurred  for  the  Armed  Forces  Qualification 
Test  (AFQT)  score  and  for  all  composites  but  General  Technical 
(GT)  and  Skilled  Technical  (ST).  In  the  case  of  General 
Maintenance  (GM)  and  Electronics  Repair  (EL),  the  increase  was 
small  but  significant  (C.2  standard  deviation).  In  the  first 
session,  the  composite  score  which  occurred  one  standard 
deviation  above  this  group's  mean  was  76.2.  After  five  sessions, 
the  composite  score  one  standard  deviation  above  this  group's 
mean  was  80.6  ( p <. 00 1 ) . 


Correlations 

Table  4  contains  the  cross-session  correlations  for  the  ten 
area  composites  and  for  AFQT.  The  overall  impression  is  of  high 
correlations  and  general  stability,  although  the  average 
intersession  correlation  for  the  last  three  days  (3,  4  and  5)  is, 
in  all  cases  but  one  (Surveillance/Communications),  higher  than 
the  average  intersession  correlation  for  the  first  three  days 
(i.e.,  1,  2  and  3) . 


Summary  o*  Results 

The  means  and  dispersions  of  scores  for  this  population  were 
below  the  national  average.  On  half  the  subtests,  means  increased 
over  sessions  .5  standard  deviation  or  more  and,  consequently,  on 
most  cf  the  composite  scores.  Correlations  for  the  subtests  and 
the  composites  were  largely  stable  over  sessions  and  were 
slightly  higher  later  in  practice.  Reliability  correlations  were 
comparable  to  reference  populations  when  adjusted  for  the  range 
restriction  of  the  present  sample. 
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DISCUSSION 


Stab! I Ity 

The  original  purpose  of  the  present  research  was  to  determine 
whether  repeated  administrations  of  forms  of  the  ASVAB  would 
produce  evidence  of  stability  of  scores.  This  question  Is  of 
Interest  for  selection,  classification  and  prediction  in  general, 
but  these  Issues  have  different  relevance,  depending  on  whether 
representative  or  exceptional  populations  are  studied.  The 
availability  of  a  small  (N=60)  Job  Corps  group  encouraged  us  to 
research  this  question  In  such  a  population.  It  was  recognized 
that  information  derived  from  a  homogeneous  sample  would  be  less 
genera  I i zab le  than  one  which  would  be  more  heterogeneous. 
However,  the  increase  in  mean  scores  which  was  expected  to  occur 
In  such  a  group  might  be  more  likely  to  emphasize  transition 
across  boundaries  of  administrative  decisions  (e.g.,  selection 
cut-off  scores  and  service  school  assignment). 

ASVAB  was  administered  five  separate  times  to  fifty-seven  men 
and  women  of  military  service  age.  Ten  Individual  subtests,  the 
derived  ASVAB  area  composites  (N=10)  and  the  Armed  Forces 
Qualification  Test  (AFQT)  were  examined  for  group  and 
differential  stability.  The  means  and  dispersions  of  scores  for 
this  sample  were  below  the  national  average.  Means  increased 
over  sessions  .5  standard  deviation  or  more  on  half  the  subtests 
and,  consequently,  on  most  of  the  composite  scores.  In  the 
present  experiment,  differential  stabilization  (Jones,  Kennedy  A 
Bittner,  1981)  with  practice  does  not  appear  to  be  a  problem  in 
ASVAB.  All  ten  subtests  were  more  or  less  differentially  stable 
on  the  first  administration.  The  same  was  true  for  the  ten 
aptitude  area  composites.  In  neither  the  subtests  nor  the  area 
composites  was  there  any  appreciable  differential  change  with 
practice,  although  mean  changes  on  repeated  administrations  of 
the  ASVAB  did  occur. 

Mean  changes  are  an  index  of  group  stability.  Four  of  the 
subtests  showed  significant  increasing  linear  trend  with 
practice.  Four  of  the  area  composites  showed  increases  from  the 
first  to  the  fifth  administration  of  .5  standard  deviation  or 
more.  These  changes  are  sufficient  to  warrant  some  concern, 
although  they  are  not  surprising  In  light  of  the  Mackaman  et  al. 
(1982)  finding  of  almost  21  percentile  points  Improvement  with 
practice  in  a  population  whose  mean  score  began  at  the  50th 
percentile.  For  example,  if  II  were  used  as  a  cutoff  score  for 
AFQT:  a)  1/3  of  those  in  the  present  experiment  who  Initially 
failed  to  achieve  this  score  later  surpass  this  score  at  least 
once;  and  b)  1/6  of  them  would  pass  more  than  once. 

Two  questions  about  time  lapse  need  to  be  answered:  Whether 
the  same  sort  of  Improvement  would  occur:  a)  if  the  five 
administrations  of  the  present  study  were  distributed  over  weeks 


or  months,  instead  of  days?  and  b)  in  a  more  representative 
sample?  We  would  predict  that  the  present  improvement  is  near 
optimum, or  might  be  better  if  administered  within  one  month.  In 
our  view,  similar  relative  improvements  (standard  scores)  would 
be  observed  in  a  more  representative  population. 


A  word,  perhaps,  is  in  order  regarding  the  possibility  that 
the  results  observed  are  due  to  regression.  Hen  and  women  who 
enter  the  Job  Corps  do  so,  at  least  in  part,  because  of  poor 
performance  in  school  and  on  the  job.  They  are  selected,  if  you 
like,  on  the  basis  of  previous  poor  performance.  To  the  extent 
that  this  previous  poor  performance  may  have  involved  transient 
(error)  components,  the  possibility  exists  that  the  average  error 
score  in  the  sample  studied  may  be  negative  at  first  testing.  If 
so,  the  group  mean  would  be  expected  to  increase  at  retesting,  as 
observed.  However,  it  would  not  be  expected  to  increase  regularly 
with  subsequent  testing,  as  also  happened.  The  possibility  of  a 
regression  effect  cannot,  therefore,  be  excluded;  but  it  seems 
unlikely  to  account  for  more  than  part  of  the  observed  increase 
with  multiple  retesting. 


Implications  for  Selection 

The  Armed  Forces  Qualification  Test  (AFQT)  score  is  employed 
in  preliminary  screening.  It  is  used  to  classify  individuals  into 
five  mental  categories  in  order  to  determine  eligibility  for 
enlistment  and  particular  job  training  (Mathews  &  Ree,  1982). 
Sims  and  Hiatt  (1981)  concluded  that  83%  of  the  predictive 
efficiency  of  the  ASVAB  is  contained  within  the  AFQT.  Were 
abbreviated  versions  of  the  ASVAB  created  in  order  to  screen 
individuals  for  more  comprehensive  testing,  it  is  likely  that 
these  subtests,  or  ones  like  them,  would  be  candidates  for 
automated  test  administration  through  microcomputer.  Therefore, 
it  may  be  advisable  to  determine  whether  such  Improvement  would 
occur  on  AFQT  scores  in  a  sample  whose  mean  scores  are  more 
nearly  like  those  for  average  Army  recruits. 

It  should  be  noted,  however,  that  this  Improvement  should  not 
be  considered  to  be  evidence  of  differential  instability.  If  the 
latter  were  to  occur,  persons  who  scored  lower  initially  might 
score  higher  later,  and  the  converse.  In  the  present  experiment, 
the  movement  of  subjects  toward  Increasing  scores  with  practice 
was  largely  uniform.  Therefore,  if  movement  across  boundaries  Is 
a  problem  for  ASVAB  utilization,  it  will  be  necessary  to  monitor 
the  number  of  times  the  test  is  taken.  Thus,  better  predictive 
validities  might  be  available  from  later  test  performances, 
because  the  correlations  are  higher. 


Suggestions  for  Future  Research 

Several  of  the  correlations  for  aptitude  area  composites  tend 
to  increase  with  practice,  a  finding  which  has  been  reported  many 
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times  before  In  repeated  measures  testing  (cf.,  Kennedy  et 
al.t  1981).  We  do  not  believe  that  the  restricted  range  of  the 
present  sample  influenced  this  improvement;  however,  this  finding 
should  be  checked.  The  result  implies  that  improved  reliability 
correlations  might  be  available  In  later  sessions;  such  improve¬ 
ment  may  be  useful  for  classification.  It  is  possible  that 
certain  persons  may  profit  more  than  others  by  extra  test  taking. 
For  example,  persons  new  to  test  taking,  who  may  qualify  as 
borderline  acceptable  for  the  military  service  schools  which  have 
less  stringent  requirements,  could  be  misassigned  to  these  latter 
occupations  when  they  could  also  be  successful  in  more  demanding 
jobs.  While  It  is  recognized  (Schmidt  &  Hunter,  1981)  that 
"selecting  from  the  top  down  maximizes  the  productivity  of 
employees  selected"  (p.  1130),  those  same  authors  propose  greater 
relevance  for  a  classification  model  than  a  selection  model 
(Hunter  &  Schmidt,  1982).  According  to  this  view.  Individuals 
should  be  assigned  to  jobs  based  on  the  criterion  of  maximizing 
productivity.  The  prospect  that  improved  differential  predictive 
validities  from  disparate  composites  may  be  available  with 
Increased  practice  on  the  ASVAB  subtests  suggests  that  such  an 
Investigation  should  be  performed  with  a  larger  sample  than  we 
used,  and  should  Include  persons  who  are  more  representative  of 
an  Incoming  military  population  and  with  longer  time  Intervals, 
it  is  not  unlikely  that  extra  testing  might  expand  the  service 
pool  (Sims  4  Hiatt,  1981)  from  the  standpoint  of  successful 
service  school  assignment. 


Future  Trends  in  Testing 

Although  paper  and  pencil  tests  of  cognitive  ability  have 
strong  roles  in  selection  and  classification,  the  advent  of 
microprocessors  likely  will  have  an  Influence  on  automating 
future  efforts  in  this  area.  If  test  automation  of  ASVAB  proceeds 
further  than  simply  translating  the  existing  tests  to  micro- 
computer/v I deo  format,  it  may  be  helpful  to  study  practice 
effects.  This  helpfulness  depends  on  exploiting  the  possibilities 
of  the  new  technology  by  developing  new  tests,  tests  that  Involve 
more  elements  of  a  perceptual,  information  processing, 
psychomotor  and  decision-making  sort.  Indeed,  it  is  considered  In 
some  places  (e.g.,  O’Leary,  1979)  that  a  "Job  sample"  approach 
not  only  has  a  higher  likelihood  of  success,  but  is  more  apt  to 
be  fair  than  some  of  the  tests  which  are  now  employed  in 
selection.  In  view  of  the  difficulties  in  the  use  of  paper  and 
pencil  tests  in  classification  (cf.,  Eaton,  Bessemer  & 
Kristiansen,  1979),  It  is  suggested  that  video  games  have  strong 
pros^ts  to  fill  such  a  role.  In  one  experiment  (LIntern  & 
KenneJ  ,  1982),  which  was  later  cross-validated  (Westra,  1983),  a 
video  game  correlated  with  a  full-scale  simulation  of  a  night 
carrier  landing  as  much  as  the  test-retest  reliability  of  the 
criterion  would  allow.  It  is  offered  that  microcomputer  video 
games  might  provide  a  fertile  target  of  opportunity  (Jones, 
Kennedy  &  Bittner,  1981).  It  should  be  noted  that,  when 
automated,  these  and  other  such  tests  usually  involve  Implicit 


knowledge  of  results,  which  might  be  expected  to  show  greater 
changes  In  the  mean  than  were  found  in  the  present  research. 
Consequently,  it  Is  likely  that,  with  practice,  they  will  show 
appreciable  differential  change  (Jones,  1981),  as  well.  The 
promising  possibility  of  Introducing  more  heterogeneity  into  the 
ASVAB  also  will  probably  revive  stablllzatlon-wlth-practlce  as  a 
major  concern. 
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Table  1 

APTITUDE  AREA  COMPOSITES  USED  IN  ASVAB  8/9/tO 


Aptitude  Area  Composite 


Subtest  Used  >n 
Computing  Composites 
for  ASVAB  8/9/'.  01 


Combat  (CO) 


AR+AS+MCtCS2 


Field  Art  1  I lery  (FA) 

Electronics  (EL) 

Operators/Foods  (OF) 

Survel I  I ance/Commun Icstlons  (SC) 


ARtMKtMC+CS 

AR+EltMK+'GS 

N0+VE3tMC+AS 


NO+CS+VE+AS 


Motor  Maintenance  (MM) 


NO+E I +MC+AS 


Generai  Maintenance  (6M) 


MK+EI+GS+AS 


Clerical  (CLi 


NO+CS+VE 


Skilled  Technical  (SO 


VEtMK*MC+GS 


General  Technical  (GT) 


VE+AR 


Note:  Table  adapted  from  a  table  originally  developed  by 

Ms.  Frances  Grafton  and  Dr.  Milt  Maler. 

1  Standard  subtest  scores  are  used  In  computation. 

2  Abbreviations  stand  for  the  fol loving: 

AR  Arithmetic  Reasoning 

AS  Auto  &  Shop  Information 

CS  Coding  Speed 

El  Electronics  Information 

MC  Mechanical  Comprehension 

MK  Math  Knowledge 

NO  Numerical  Operation 

GS  General  Science 

3  Verba!  (VE)  Is  a  standard  score  conversion  of  the  sum  of  raw 

scores  for  word  knowledge  (NK)  and  paragraph  comprehension  (PC) 


Table  2 


MEANS  AND  STANDARD  DEVIATIONS  FOR  SUCCUSIU 

TEST  ADMINISTRATIONS  ORDERED  BY  STRENGTH  OF  LINEAR  TREND 
WITH  LINEAR  AND  QUADRATIC  PROBABILITIES  FOR  10  SUBTESTS 


Means 


Section 

Coding  Speed  (CS) 
Numerical  Oper  (NO) 
Math  Know  (MK) 

Mech  Comp  (MC) 

Auto  &  Shop  Info  (AS) 
Gen  Science  (GS) 

Word  Know  (WK) 
Electronics  Info  (El) 
Arithmetic  Reas  (AR) 
Paragraph  Comp  (PC) 


8B 

9A 

9B 

10A 

26.7 

33.6 

36.0 

34.9 

24.1 

26.4 

26.4 

29.4 

6.8 

6.7 

7.2 

8.4 

8.1 

7.7 

7.6 

8.7 

7.3 

7.6 

7.8 

7.8 

8.6 

8.1 

;.9 

8.0 

13.0 

12.7 

12.4 

10.6 

6.5 

6.3 

6.7 

6.8 

8.9 

8.4 

10.0 

9.4 

6.6 

4.9 

5.6 

4.8 

10B 

Linear 

Quad 

39.6 

.0000 

.0081 

30.6 

.0000 

.7602 

7.7 

.0017 

.7758 

8.5 

.0200 

.1316 

8.1 

.0757 

.6083 

8.1 

.0824 

.2023 

13.1 

.1698 

.0041 

7.0 

.2280 

.4727 

9.3 

.2730 

.3533 

6.7 

.5982 

.0001 

Standard  Deviations 


Section 

Coding  Speed  (CS) 
Numerical  Oper  (NO) 
Math  Know  (MK) 

Mech  Comp  (NC) 

Auto  &  Shop  Info  (AS) 
Gen  Science  (GS) 

Word  Know  (WK) 
Electronics  Info  (El) 
Arithmetic  Reas  (AR) 
Paragraph  Comp  (PC) 


8B  9A  96  1QA 


13.9 

9.4 
2.7 
3.2 
2.7 

3.5 

5.4 

2.5 
3.9 
3.0 


15.2 

11.1 

2.3 
3.0 

3.4 

3.8 

6.0 

2.9 
3.3 
2.7 


16.6 

9.2 
2.7 

3.2 
3.0 
3.9 
6.1 

2.4 

3.4 

3.3 


14.7 

10.9 

2.6 

3.3 

3.3 

3.8 

5.1 

3.2 
3.5 

2.9 


10B 

15.2 

10.9 

3.0 

3.6 

3.2 
4.0 
5.5 
3.0 

4.2 
2.9 
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Table  3 

CROSS-SESSION  CORRELATIONS 
OF  THE  TEN  TEST  SCORES  N-57 


General  Science 


Session 

2 

3 

4 

5 

Ses.  "" 

1  .66 

.68 

.72 

.72 

2 

.74 

.73 

.70 

3 

.76 

.79 

4 

.82 

Word  Knowledge 

Session 

2 

3 

4 

5 

Ses. 

1  .70 

.73 

.67 

.79 

2 

.80 

.71 

.80 

3 

.78 

.83 

4 

.77 

Numerical  Operation 

Session 

2 

3 

4 

5 

Ses. 

1  .85 

.86 

.85 

.86 

2 

.90 

.90 

.87 

3 

.90 

.86 

4 

.93 

Auto  l 

Shop  Information 

Session 

2 

3 

4 

5 

Ses. 

1  .44 

.45 

.66 

.67 

2 

.58 

.41 

.39 

3 

.47 

.54 

4 

.72 

Mechanical  Comprehension 


Session 


Arithmetic  Reasoning 


Session 

2 

3 

4 

5 

Ses. 

1  .46 

.56 

.71 

.63 

2 

.54 

.51 

.65 

3 

.58 

.70 

4 

.73 

Paragraph  Comprehension 

Session 

2 

3 

4 

5 

Ses. 

1  .62 

.59 

.58 

.58 

2 

.69 

.47 

.69 

3 

.60 

.66 

4 

.57 

Coding 

Speed 

Session 

2 

3 

4 

5 

Ses. 

1  .80 

.73 

.73 

.67 

2 

.86 

.82 

.77 

3 

.85 

.80 

4 

.86 

Mathematical  Knowledge 

Session 

2 

3 

4 

5 

Ses. 

1  .24 

.46 

.52 

2 

.39 

.36 

.26 

3 

.54 

.42 

4 

.46 

Electronics  Information 
Session 


Table  4 

MEANS  AND  STANDARD  DEVIATIONS  FOR  FIVE  SUCCESSIVE 
TEST  ADMINISTRATIONS  ORDERED  BY  STRENGTH  OF  LINEAR  TREND 
WITH  LINEAR  AND  QUADRATIC  PROBABILITIES  FOR  10  COMPOSITES 


Means 


8b 

9a 

9b 

10a 

10b 

Linear 

Quad 

AFQT 

12.6 

12.5 

13.6 

12.0 

15.5 

.0000 

.0004 

CL 

69.3 

72.5 

72.8 

73.4 

79.0 

.0000 

.0752 

MM 

65.2 

66.2 

67.0 

70.2 

70.8 

.0000 

.6040 

SC 

66.5 

68.9 

70.3 

70.0 

74.5 

.0000 

.3168 

CO 

66.2 

69.1 

70.3 

70.7 

72.4 

.0000 

.2456 

FA 

68.6 

72.2 

73.4 

75.6 

76.1 

.0000 

.0890 

OF 

64.9 

64.7 

65.8 

66.8 

70.0 

.0000 

.0080 

GM 

64.9 

65.1 

65.1 

67.0 

66.9 

.0112 

.6220 

EL 

66.2 

67.0 

67.3 

69.9 

68.0 

.0152 

.5240 

ST 

66.0 

63.0 

63.5 

64.7 

66.5 

.2034 

.0010 

GT 

67.6 

65.6 

67.7 

63.2 

68.0 

5655 

.0202 

Standard 

1  Devi  at i 

ons 

AFQT 

10.2 

10.5 

11.8 

10.8 

12.5 

CL 

13.9 

15.4 

16.6 

15.6 

16.5 

MM 

11.4 

11.4 

11.3 

12.2 

11.7 

SC 

11.6 

12.0 

14.0 

13.7 

13.6 

CO 

12.4 

9.9 

10.2 

11.6 

11.9 

FA 

12.3 

11.0 

11.8 

11.7 

12.4 

OF 

11.5 

11.2 

11.8 

11.7 

12.4 

GM 

11.5 

11.8 

11.8 

12.7 

12.7 

EL 

12.4 

12.7 

11.8 

12.9 

13.5 

ST 

10.6 

11.7 

13.4 

11.0 

12.6 

GT 

12.8 

12.9 

13.2 

13.1 

13.5 
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Table  5. 

INTER-ADMINISTRATION  CORRELATIONS 
OF  THE  TEN  AREA  COMPOSITES  AND  AFQT 


Gen  Maintenance  (GM) 


Session 

2 

_ 3 

4 

5 

Ses. 

1  .67 

.83 

.79 

.82 

2 

.73 

.77 

.74 

3 

.81 

.82 

4 

.85 

Clerical 

(CL) 

Session 

2 

3 

4 

5 

Ses. 

1  .91 

.88 

.88 

.88 

2 

.90 

.91 

.91 

3 

.94 

.87 

4 

.91 

Surv/Comm  (SC) 

Session 

2 

3 

4 

5 

Ses. 

1  .90 

.87 

.85 

.85 

2 

.89 

.87 

.87 

3 

.90 

.87 

4 

.88 

Field  Artil  (FA) 

Session 

2 

3 

4 

5 

Ses. 

1  .64 

.75 

.78 

.76 

2 

.78 

.74 

.76 

3 

.87 

.81 

4 

.85 

Skilled 

Tech 

(ST) 

Session 

2 

3 

4 

5 

Ses. 

1  .75 

.79 

.82 

.83 

2 

.79 

.80 

.81 

3 

.83 

.81 

4 

.86 

Gen  Tech  (GT) 

Session 

2 

3  4 

5 

Ses. 

1  .69 

.69  .76 

.72 

2 

.77  .76 

.86 

3 

.77 

.84 

4 

.79 

Electronics  (EL) 
Session 

2 

3  4 

5 

Ses. 

1  .66 

.80  .75 

.85 

2 

.74  .81 

.73 

3 

.82 

.79 

4 

.84 

Motor  Maintenance  (MM) 
Session 

2 

3  4 

5 

Ses. 

1  .79 

.66  .72 

.81 

2 

.62  .75 

.77 

3 

.65 

.65 

4 

.83 

Combat 

(CO) 

Session 

2 

3  4 

5 

Ses. 

1  .67 

.74  .82 

.75 

2 

.76  .76 

.73 

3 

.83 

.81 

4 

.87 

Operators/Foods  (OF) 
Session 

2 

3  4 

5 

Ses. 

1  .80 

.82  .87 

.84 

2 

.79  .80 

.78 

3 

.87 

.85 

4 

.91 
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Table  5  (Cont.) 


a 

I 

.»v 

ft 


¥5 

$■> 

H'AA 

nV’\ 


p 

I 


Vs* 

i;i; 

>/  v 

& 


Armed  Forces  Qualification  Test  (AFQT) 
Session 

2  3  4  5 

Ses  • 

1  *.90  .91  .92  .92 

2  .94  .92  .91 

3  .93  .93 

4  .92 
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APPENDIX  A 


A  COMPARISON  OF 

TEST/RETEST  CORRELATIONS  FOR  THE  PRESENT  SAMPLE  (N=57) 
EARLY  (SESSIONS  1,  2)  AND  LATE  (SESSIONS  4,  5)  IN  PRACTICE 
AND  A  REFERENCE  SAMPLE  TESTED  TWICE 


(Correlations  Corrected  for  Restriction  in  Range) 


Subtest 


Sessions 


GS 

.8216 

AR 

.8681 

WK 

.8388 

PC 

.7266 

NO 

.8670 

CS 

.8360 

AS 

.8258 

MK 

.8842 

MC 

.7309 

EL 

.8535 

7887 

.8929 

8649 

.9258 

8392 

.8923 

6261 

.6837 

7523 

.9305 

7115 

.8806 

7998 

.9029 

8656 

.8952 

7803 

.8977 

7351 

.8409 

1  Source:  Friedman,  Streicher,  Wing  &  Grafton,  1982 


